{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "In this markdown, we re-analyze data from a published study on response of lung cancer cells to the tyrosine kinase inhibitor, erlotinib (PMID: 25404012). Erlotinib is used as a therapeutic agent in lung cancer patients who carry mutations in the epidermal growth factor receptor (EGFR). Patients initially respond well to the drug, but inevitably develop resistance. We focus on 243 phosphorylation sites in 194 proteins that were significantly upregulated by treatment with the EGFR ligand, epidermal growth factor (EGF), and downregulated by erlotinib. These sites are likely to be targets of EGFR-regulated pathways that are inhibited by drug treatment. We retrieve kinases for these sites from iPTMnet using iptmnetr and then compute some basic statistics on the results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retrieving Kinase Information\n",
    "In this part, we retrieve kinases from iPTMnet for the EGFR/erlotinib-regulated sites using pyiptmnet, and write the table of kinase-site relationships to a file. The sites are listed in the file egfr_sites_formatted.txt. The input file has three tab-delmited columns: UniProtAC of the phosphorylated protein, amino acid residue of the phosphorylated site, and position of the phosphorylated site (e.g., P12345 S 100). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pyiptmnet.api as api"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>enz_name</th>\n",
       "      <th>enz_id</th>\n",
       "      <th>sub_name</th>\n",
       "      <th>sub_id</th>\n",
       "      <th>ptm_type</th>\n",
       "      <th>site</th>\n",
       "      <th>site_position</th>\n",
       "      <th>score</th>\n",
       "      <th>source</th>\n",
       "      <th>pmids</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>238</th>\n",
       "      <td>ABL1</td>\n",
       "      <td>P00519</td>\n",
       "      <td>EGFR</td>\n",
       "      <td>P00533</td>\n",
       "      <td>Phosphorylation</td>\n",
       "      <td>Y1197</td>\n",
       "      <td>1197</td>\n",
       "      <td>3</td>\n",
       "      <td>neXtProt,PSP,Signor</td>\n",
       "      <td>16943190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>277</th>\n",
       "      <td>AKT1</td>\n",
       "      <td>P31749</td>\n",
       "      <td>EIF4B</td>\n",
       "      <td>P23588</td>\n",
       "      <td>Phosphorylation</td>\n",
       "      <td>S422</td>\n",
       "      <td>422</td>\n",
       "      <td>3</td>\n",
       "      <td>neXtProt,Signor</td>\n",
       "      <td>18836482</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>280</th>\n",
       "      <td>AKT1</td>\n",
       "      <td>P31749</td>\n",
       "      <td>PDCD4</td>\n",
       "      <td>Q53EL6</td>\n",
       "      <td>Phosphorylation</td>\n",
       "      <td>S457</td>\n",
       "      <td>457</td>\n",
       "      <td>3</td>\n",
       "      <td>neXtProt,PSP,Signor</td>\n",
       "      <td>16357133</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>279</th>\n",
       "      <td>AKT1</td>\n",
       "      <td>P31749</td>\n",
       "      <td>FLNC</td>\n",
       "      <td>Q14315</td>\n",
       "      <td>Phosphorylation</td>\n",
       "      <td>S2233</td>\n",
       "      <td>2233</td>\n",
       "      <td>3</td>\n",
       "      <td>neXtProt,PSP</td>\n",
       "      <td>15461588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>278</th>\n",
       "      <td>AKT1</td>\n",
       "      <td>P31749</td>\n",
       "      <td>IRS1</td>\n",
       "      <td>P35568</td>\n",
       "      <td>Phosphorylation</td>\n",
       "      <td>S629</td>\n",
       "      <td>629</td>\n",
       "      <td>3</td>\n",
       "      <td>neXtProt,PSP</td>\n",
       "      <td>17640984</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    enz_name  enz_id sub_name  sub_id         ptm_type   site  site_position  \\\n",
       "238     ABL1  P00519     EGFR  P00533  Phosphorylation  Y1197           1197   \n",
       "277     AKT1  P31749    EIF4B  P23588  Phosphorylation   S422            422   \n",
       "280     AKT1  P31749    PDCD4  Q53EL6  Phosphorylation   S457            457   \n",
       "279     AKT1  P31749     FLNC  Q14315  Phosphorylation  S2233           2233   \n",
       "278     AKT1  P31749     IRS1  P35568  Phosphorylation   S629            629   \n",
       "\n",
       "     score               source     pmids  \n",
       "238      3  neXtProt,PSP,Signor  16943190  \n",
       "277      3      neXtProt,Signor  18836482  \n",
       "280      3  neXtProt,PSP,Signor  16357133  \n",
       "279      3         neXtProt,PSP  15461588  \n",
       "278      3         neXtProt,PSP  17640984  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kinase_info =  api.get_ptm_enzymes_from_file(\"Supplementary Data 1.txt\")\n",
    "kinase_info = kinase_info.sort_values(by=\"enz_name\")\n",
    "kinase_info.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "kinase_info.to_csv(\"Supplementary Data 2.txt\",sep='\\t')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "337"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Find number of kinase-site pairs\n",
    "num_kinase_site_pairs = len(kinase_info)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic Statistics\n",
    "\n",
    "Next, we compute:\n",
    "\n",
    "* Number of kinase-site pairs\n",
    "* Number of sites with at least one kinase\n",
    "* Number of kinases\n",
    "* Number of sites per kinase\n",
    "* Number of kinases that phosphorylate three or more sites"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Find number of sites with at least one kinase\n",
    "kinase_info[\"full_site\"] = kinase_info[\"sub_id\"] + \" \" + kinase_info[\"site\"]\n",
    "unique_sites = kinase_info.drop_duplicates(subset=\"full_site\")\\\n",
    "len(unique_sites)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "53"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Find number of unique kinases\n",
    "unique_kinases = kinase_info.drop_duplicates(subset=\"enz_id\")\n",
    "len(unique_kinases)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PRKCA       6\n",
       "RPS6KB1     6\n",
       "PRKCD       5\n",
       "MAPK1       5\n",
       "RPS6KA1     5\n",
       "PRKACA      4\n",
       "AKT1        4\n",
       "MAP2K2      4\n",
       "EGFR        4\n",
       "PLK1        4\n",
       "CHEK1       4\n",
       "MAPK3       3\n",
       "AURKB       3\n",
       "PRKCE       3\n",
       "PRKD1       3\n",
       "RPS6KA3     3\n",
       "MAP2K1      3\n",
       "SRC         2\n",
       "JAK2        2\n",
       "PDPK1       2\n",
       "BRAF        2\n",
       "MAP3K8      2\n",
       "RPS6KA5     2\n",
       "PRKCH       2\n",
       "PAK2        2\n",
       "CSNK2A1     2\n",
       "PRKCZ       2\n",
       "ROCK1       2\n",
       "CAMK2A      2\n",
       "IKBKB       2\n",
       "PRKD3       2\n",
       "ABL1        1\n",
       "SGK1        1\n",
       "DAPK1       1\n",
       "PTK6        1\n",
       "HCK         1\n",
       "MAPKAPK5    1\n",
       "MTOR        1\n",
       "MKNK1       1\n",
       "PAK1        1\n",
       "ROCK2       1\n",
       "EEF2K       1\n",
       "MAPK8       1\n",
       "RPS6KA4     1\n",
       "PASK        1\n",
       "MAPK13      1\n",
       "PKD1        1\n",
       "RET         1\n",
       "AKT2        1\n",
       "MAPK14      1\n",
       "INSR        1\n",
       "LCK         1\n",
       "Name: enz_name, dtype: int64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Find number of sites per kinase\n",
    "kinase_tally_sorted  = kinase_info[\"enz_name\"].value_counts()\n",
    "kinase_tally_sorted "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "17"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Find number of kinases that phosphorylate three or more sites\n",
    "high_freq_kinases = kinase_tally_sorted[kinase_tally_sorted >= 3]\n",
    "num_high_freq_kinases = len(high_freq_kinases)\n",
    "num_high_freq_kinases"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
